108 research outputs found

    P-splines with derivative based penalties and tensor product smoothing of unevenly distributed data

    Get PDF
    The P-splines of Eilers and Marx (1996) combine a B-spline basis with a discrete quadratic penalty on the basis coefficients, to produce a reduced rank spline like smoother. P-splines have three properties that make them very popular as reduced rank smoothers: i) the basis and the penalty are sparse, enabling efficient computation, especially for Bayesian stochastic simulation; ii) it is possible to flexibly `mix-and-match' the order of B-spline basis and penalty, rather than the order of penalty controlling the order of the basis as in spline smoothing; iii) it is very easy to set up the B-spline basis functions and penalties. The discrete penalties are somewhat less interpretable in terms of function shape than the traditional derivative based spline penalties, but tend towards penalties proportional to traditional spline penalties in the limit of large basis size. However part of the point of P-splines is not to use a large basis size. In addition the spline basis functions arise from solving functional optimization problems involving derivative based penalties, so moving to discrete penalties for smoothing may not always be desirable. The purpose of this note is to point out that the three properties of basis-penalty sparsity, mix-and-match penalization and ease of setup are readily obtainable with B-splines subject to derivative based penalization. The penalty setup typically requires a few lines of code, rather than the two lines typically required for P-splines, but this one off disadvantage seems to be the only one associated with using derivative based penalties. As an example application, it is shown how basis-penalty sparsity enables efficient computation with tensor product smoothers of scattered data

    Two dimensional smoothing via an optimised Whittaker smoother

    Get PDF
    Background In many applications where moderate to large datasets are used, plotting relationships between pairs of variables can be problematic. A large number of observations will produce a scatter-plot which is difficult to investigate due to a high concentration of points on a simple graph. In this article we review the Whittaker smoother for enhancing scatter-plots and smoothing data in two dimensions. To optimise the behaviour of the smoother an algorithm is introduced, which is easy to programme and computationally efficient. Results The methods are illustrated using a simple dataset and simulations in two dimensions. Additionally, a noisy mammography is analysed. When smoothing scatterplots the Whittaker smoother is a valuable tool that produces enhanced images that are not distorted by the large number of points. The methods is also useful for sharpening patterns or removing noise in distorted images. Conclusion The Whittaker smoother can be a valuable tool in producing better visualisations of big data or filter distorted images. The suggested optimisation method is easy to programme and can be applied with low computational cost

    Forecasting Player Behavioral Data and Simulating in-Game Events

    Full text link
    Understanding player behavior is fundamental in game data science. Video games evolve as players interact with the game, so being able to foresee player experience would help to ensure a successful game development. In particular, game developers need to evaluate beforehand the impact of in-game events. Simulation optimization of these events is crucial to increase player engagement and maximize monetization. We present an experimental analysis of several methods to forecast game-related variables, with two main aims: to obtain accurate predictions of in-app purchases and playtime in an operational production environment, and to perform simulations of in-game events in order to maximize sales and playtime. Our ultimate purpose is to take a step towards the data-driven development of games. The results suggest that, even though the performance of traditional approaches such as ARIMA is still better, the outcomes of state-of-the-art techniques like deep learning are promising. Deep learning comes up as a well-suited general model that could be used to forecast a variety of time series with different dynamic behaviors

    Dynamical density delay maps: simple, new method for visualising the behaviour of complex systems

    Get PDF
    Background. Physiologic signals, such as cardiac interbeat intervals, exhibit complex fluctuations. However, capturing important dynamical properties, including nonstationarities may not be feasible from conventional time series graphical representations. Methods. We introduce a simple-to-implement visualisation method, termed dynamical density delay mapping (``D3-Map'' technique) that provides an animated representation of a system's dynamics. The method is based on a generalization of conventional two-dimensional (2D) Poincarďż˝ plots, which are scatter plots where each data point, x(n), in a time series is plotted against the adjacent one, x(n+1). First, we divide the original time series, x(n) (n=1,..., N), into a sequence of segments (windows). Next, for each segment, a three-dimensional (3D) Poincarďż˝ surface plot of x(n), x(n+1), hx(n),x(n+1) is generated, in which the third dimension, h, represents the relative frequency of occurrence of each (x(n),x(n+1)) point. This 3D Poincar\'e surface is then chromatised by mapping the relative frequency h values onto a colour scheme. We also generate a colourised 2D contour plot from each time series segment using the same colourmap scheme as for the 3D Poincar\'e surface. Finally, the original time series graph, the colourised 3D Poincar\'e surface plot, and its projection as a colourised 2D contour map for each segment, are animated to create the full ``D3-Map.'' Results. We first exemplify the D3-Map method using the cardiac interbeat interval time series from a healthy subject during sleeping hours. The animations uncover complex dynamical changes, such as transitions between states, and the relative amount of time the system spends in each state. We also illustrate the utility of the method in detecting hidden temporal patterns in the heart rate dynamics of a patient with atrial fibrillation. The videos, as well as the source code, are made publicly available. Conclusions. Animations based on density delay maps provide a new way of visualising dynamical properties of complex systems not apparent in time series graphs or standard Poincar\'e plot representations. Trainees in a variety of fields may find the animations useful as illustrations of fundamental but challenging concepts, such as nonstationarity and multistability. For investigators, the method may facilitate data exploration

    Analysis of Overlapped and Noisy Hydrogen/Deuterium Exchange Mass Spectra

    Get PDF
    This document is the Accepted Manuscript version of a Published Work that appeared in final form in the Journal of the American Chemical Society, copyright © American Chemical Society after peer review and technical editing by the publisher. To access the final edited and published work see http://doi.org/10.1007/s13361-013-0727-5.Noisy and overlapped mass spectrometry data hinders the sequence coverage that can be obtained from Hydrogen Deuterium exchange analysis, and places a limit on the complexity of the samples that can be studied by this technique. Advances in instrumentation have addressed these limits, but as the complexity of the biological samples under investigation increases, these problems are reencountered. Here we describe the use of binomial distribution fitting with asymmetric linear squares regression for calculating the accurate deuterium content for mass envelopes of low signal or that contain significant overlap. The approach is demonstrated with a test data set of HIV Env gp140 wherein inclusion of the new analysis regime resulted in obtaining exchange data for 42 additional peptides, improving the sequence coverage by 11%. At the same time, the precision of deuterium uptake measurements was improved for nearly every peptide examined. The improved processing algorithms also provide an efficient method for deconvolution of bimodal mass envelopes and EX1 kinetic signatures. All these functions and visualization tools have been implemented in the new version of the freely available software, HX-Express v2

    Applied immuno-epidemiological research: an approach for integrating existing knowledge into the statistical analysis of multiple immune markers.

    Get PDF
    BACKGROUND: Immunologists often measure several correlated immunological markers, such as concentrations of different cytokines produced by different immune cells and/or measured under different conditions, to draw insights from complex immunological mechanisms. Although there have been recent methodological efforts to improve the statistical analysis of immunological data, a framework is still needed for the simultaneous analysis of multiple, often correlated, immune markers. This framework would allow the immunologists' hypotheses about the underlying biological mechanisms to be integrated. RESULTS: We present an analytical approach for statistical analysis of correlated immune markers, such as those commonly collected in modern immuno-epidemiological studies. We demonstrate i) how to deal with interdependencies among multiple measurements of the same immune marker, ii) how to analyse association patterns among different markers, iii) how to aggregate different measures and/or markers to immunological summary scores, iv) how to model the inter-relationships among these scores, and v) how to use these scores in epidemiological association analyses. We illustrate the application of our approach to multiple cytokine measurements from 818 children enrolled in a large immuno-epidemiological study (SCAALA Salvador), which aimed to quantify the major immunological mechanisms underlying atopic diseases or asthma. We demonstrate how to aggregate systematically the information captured in multiple cytokine measurements to immunological summary scores aimed at reflecting the presumed underlying immunological mechanisms (Th1/Th2 balance and immune regulatory network). We show how these aggregated immune scores can be used as predictors in regression models with outcomes of immunological studies (e.g. specific IgE) and compare the results to those obtained by a traditional multivariate regression approach. CONCLUSION: The proposed analytical approach may be especially useful to quantify complex immune responses in immuno-epidemiological studies, where investigators examine the relationship among epidemiological patterns, immune response, and disease outcomes

    Attainment rate as a surrogate indicator of the intervertebral neutral zone length in lateral bending: An in vitro proof of concept study

    Get PDF
    Background Lumbar segmental instability is often considered to be a cause of chronic low back pain. However, defining its measurement has been largely limited to laboratory studies. These have characterised segmental stability as the intrinsic resistance of spine specimens to initial bending moments by quantifying the dynamic neutral zone. However these measurements have been impossible to obtain in vivo without invasive procedures, preventing the assessment of intervertebral stability in patients. Quantitative fluoroscopy (QF), measures the initial velocity of the attainment of intervertebral rotational motion in patients, which may to some extent be representative of the dynamic neutral zone. This study sought to explore the possible relationship between the dynamic neutral zone and intervertebral rotational attainment rate as measured with (QF) in an in vitro preparation. The purpose was to find out if further work into this concept is worth pursuing. Method This study used passive recumbent QF in a multi-segmental porcine model. This assessed the intrinsic intervertebral responses to a minimal coronal plane bending moment as measured with a digital force guage. Bending moments about each intervertebral joint were calculated and correlated with the rate at which global motion was attained at each intervertebral segment in the first 10° of global motion where the intervertebral joint was rotating. Results Unlike previous studies of single segment specimens, a neutral zone was found to exist during lateral bending. The initial attainment rates for left and right lateral flexion were comparable to previously published in vivo values for healthy controls. Substantial and highly significant levels of correlation between initial attainment rate and neutral zone were found for left (Rho = 0.75, P = 0.0002) and combined left-right bending (Rho = 0.72, P = 0.0001) and moderate ones for right alone (Rho = 0.55, P = 0.0012). Conclusions This study found good correlation between the initial intervertebral attainment rate and the dynamic neutral zone, thereby opening the possibility to detect segmental instability from clinical studies. However the results must be treated with caution. Further studies with multiple specimens and adding sagittal plane motion are warranted

    Applying Spatial Copula Additive Regression to Breast Cancer Screening Data

    Get PDF
    Breast cancer is associated with several risk factors. Although genetics is an important breast cancer risk factor, environmental and sociodemographic characteristics, that may differ across populations, are also factors to be taken into account when studying the disease. These factors, apart from having a role as direct agents in the risk of the disease, can also influence other variables that act as risk factors. The age at menarche and the reproductive lifespan are considered by the literature as breast cancer risk factors so that, there are several studies whose aim is to analyze the trend of age at menarche and menopause along generations. Also, it is believed that these two moments in a woman’s life can be affected by environmental, social status, and lifestyles of women. Using the information of 278,282 registries of women which entered in the breast cancer screening program in Central Portugal, we developed a bivariate copula model to quantify the effect a woman’s year of birth in the association between age at menarche and a woman’s reproductive lifespan, in addition to explore any possible effect of the geographic location in these variables and their association. For this analysis we employ Copula Generalized Additive Models for Location, Scale and Shape (CGAMLSS) models and the inference was carried out using the R package SemiParBIVProbit

    Automated smoother for the numerical decoupling of dynamics models

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Structure identification of dynamic models for complex biological systems is the cornerstone of their reverse engineering. Biochemical Systems Theory (BST) offers a particularly convenient solution because its parameters are kinetic-order coefficients which directly identify the topology of the underlying network of processes. We have previously proposed a numerical decoupling procedure that allows the identification of multivariate dynamic models of complex biological processes. While described here within the context of BST, this procedure has a general applicability to signal extraction. Our original implementation relied on artificial neural networks (ANN), which caused slight, undesirable bias during the smoothing of the time courses. As an alternative, we propose here an adaptation of the Whittaker's smoother and demonstrate its role within a robust, fully automated structure identification procedure.</p> <p>Results</p> <p>In this report we propose a robust, fully automated solution for signal extraction from time series, which is the prerequisite for the efficient reverse engineering of biological systems models. The Whittaker's smoother is reformulated within the context of information theory and extended by the development of adaptive signal segmentation to account for heterogeneous noise structures. The resulting procedure can be used on arbitrary time series with a nonstationary noise process; it is illustrated here with metabolic profiles obtained from <it>in-vivo </it>NMR experiments. The smoothed solution that is free of parametric bias permits differentiation, which is crucial for the numerical decoupling of systems of differential equations.</p> <p>Conclusion</p> <p>The method is applicable in signal extraction from time series with nonstationary noise structure and can be applied in the numerical decoupling of system of differential equations into algebraic equations, and thus constitutes a rather general tool for the reverse engineering of mechanistic model descriptions from multivariate experimental time series.</p
    • …
    corecore